Search Datasets in Literature: A Case Study of GWAS

نویسندگان

  • Xiao Dong
  • Yaoyun Zhang
  • Hua Xu
چکیده

One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Using Genome-Wide Association Studies (GWAS) as a use case, we conducted an initial study to identify GWAS dataset attributes in MEDLINE abstracts, by developing a hybrid approach that combines domain dictionaries and pattern-based rules. The automatic GWAS dataset attribute recognition system achieved an F-measure of 84.85%. We further applied the GWAS attribute recognition system to indexing MEDLINE abstracts and built an online GWAS dataset search engine called "GWAS Dataset Finder". Our evaluation showed that the GWAS Dataset Finder outperformed PubMed significantly in retrieving literature with desired datasets. Our study demonstrates the potential application of text mining methods in building the data discovery index. It can create a better index of literature linked with their underlying data sets, thus improving data discoverability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network-Assisted Investigation of Combined Causal Signals from Genome-Wide Association Studies in Schizophrenia

With the recent success of genome-wide association studies (GWAS), a wealth of association data has been accomplished for more than 200 complex diseases/traits, proposing a strong demand for data integration and interpretation. A combinatory analysis of multiple GWAS datasets, or an integrative analysis of GWAS data and other high-throughput data, has been particularly promising. In this study,...

متن کامل

Amodiaquine-Associated Asthenia: A Case Based Review and Gaps in Literature

Introduction: Amodiaquine is a partner drug in the artemisinin-based combination therapy artesunate-amodiaquine. Reports of the adverse drug reaction known as amodiaquine-associated asthenia are scarce, and this adverse reaction needs to be investigated in detail. This article presents and reviews a case of amodiaquine-associated asthenia. A literature search for the characteri...

متن کامل

Subcutaneous Emphysema as a Complication of Tonsillectomy: A Systematic Literature Review and Case Report

Introduction: Subcutaneous and mediastinal emphysema is a rare complication after tonsillectomy. This case presentation and literature review summarizes the existing literature on this unusual complication.  Materials and Methods: This study presents a case of a 21-year-old man who developed a cervical subcutaneous emphysema 6 days after tonsillectomy, whereby conservative treatment produced sp...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Intelligent Tuned Harmony Search for Solving Economic Dispatch Problem with Valve-point Effects and Prohibited Operating Zones

Economic dispatch with valve point effect and Prohibited Operating Zones (POZs) is a non-convex and discontinuous optimization problem. Harmony Search (HS) is one of the recently presented meta-heuristic algorithms for solving optimization problems, which has different variants. The performances of these variants are severely affected by selection of different parameters of the algorithm. Intel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017